ftp.cs.arizona.edu

home *** CD-ROM | disk | FTP | other *** search

/ ftp.cs.arizona.edu / ftp.cs.arizona.edu.tar / ftp.cs.arizona.edu / tsql / doc / tsql.mail / 000084_rts _Tue Apr 20 21:50:41 1993.msg < prev next >

Wrap

Internet Message Format | 1996-01-31 | 5KB

Received: from boojum.CS.Arizona.EDU by optima.CS.Arizona.EDU (5.65c/15) via SMTP id AA03754; Tue, 20 Apr 1993 21:50:42 MST Date: Tue, 20 Apr 1993 21:50:41 MST From: "Rick Snodgrass" <rts> Message-Id: <199304210450.AA19123@boojum.cs.arizona.edu> Received: by boojum.cs.arizona.edu; Tue, 20 Apr 1993 21:50:41 MST To: tsql@cs.arizona.edu Subject: benchmark schema and instance There has been remarkably little traffic on the tsql mailing list concerning "key" and "groupness" in the benchmark. Given that the workshop is less than two months away, it is imperative that we finalize the schema and the instance, achieve consensus on the taxonomy, and move on to actually listing the queries. Let me attempt to summarize the state of the discussion. Jim and Christian, if I have misinterpreted anything, please let me know. A. The database instance should accord with ALL AND ONLY those constraints which are explicitly stated. One constraint held by the data that was not stated was that the keys were time invariant. Christian mentioned two other such constraints: future information was not included and continuously varying attributes weren't included. One can envision all sorts of implicit constraints that are present in the data: the cardinality of the Name attribute must be two; salaries must be monotonically increasing; employees must not return to previous departments; no skills can be shared by two employees, etc. I don't believe a specific instance can satisfy all and only a (finite) set of stated constraints. B. The term "key" as used may be defined as "at all times t, the attribute is a key of the snapshot of the relation at time t". This is my rephrasing of Jim's stated definition. There seems to be consensus on this definition of the term "key". C. Keys in the instance are time-invariant. This is certainly true in the current instance. Christian lists four different alternatives, one being retaining this constraint in the instance, and making it explicit. Jim advocates dropping this constraint and modifying the instance by allowing *ED* to change his name on 1/1/88 to "Edward". D. Grouped models have some very nice properties with regard to updating and querying. Jim's description of the saga of *ED* is the best I've seen on why grouped models are so nice. If and when they are supported by temporal query languages, many advantages would accrue. Jim's argument is that we should drop the time-invariant constraint on keys in the benchmark instance to allow these nice properties to be demonstrated. There are, however, some disadvantages to dropping this constraint. 1. It makes the instance more complicated, though only marginally so. 2. Quoting from Jim's book chapter (p. 532), "we did not find here, nor are we aware of, *any* complete algebra for grouped historical data models." I personally have some doubts that a complete algebra that is also efficient is even possible. 3. The only calculus I (Rick) am aware of that is grouped complete is the L_h calculus with which grouped completeness is defined. If this is true, then no current SQL or Quel-based temporal language will be able to demonstrate the advantages of grouping on the proposed instance. Whether a reasonable SQL extension that is grouped complete is possible is not known. 4. As Christian has noted, and as Jim's analysis of simulating grouping in an ungrouped model shows, grouping relies heavily on the notion of surrogate. Such a notion is entirely absent from SQL2. Doing surrogates right in my opinion leads one to an identity-based model, as opposed to SQL2's value-based model. I advocate being quite focussed, and not trying to do everything at once, either in the benchmark or in the initial SQL2 design. The benchmark is to be used to evaluate proposed extensions to SQL2. Aspects that require nontemporal extensions to SQL2 may be very attractive, but are not within the purview of this initial effort, in my opinion. For example, an extension that requires recursive queries, or that requires subclassing, takes us far afield. Making the change to the instance of allowing time-varying keys would invalidate all current approaches to SQL-based query language design. Such a change may in the long run be exactly the right way to go, but it cannot at this point in time be considered to be part of existing infrastructure. There are simply too many unanswered questions, and too great an incompatibility with SQL2. In conclusion, I strongly advocate leaving the instance as is, and stating explicitly the constraint that keys be time-invariant in this schema. I also encourage further research into grouping, so that its many apparent advantages can be later incorporated into a consensus query language.